What is claimed is: 



1 1 . A method comprising parallelizing execution of loop computations that involve 

2 indirectly accessed sparse arrays/matrices using software-pipelining to increase 

3 instruction-level parallelism and decrease initiation interval. 

1 2. The method of claim 1, wherein software-pipelining comprises: 

2 parallelizing dependence loop body instructions between the subsequent 

3 independent iterations. 

1 3. The method of claim 2, wherein decreasing initiation interval comprises: 

2 increasing resource initiation interval and recurrence initiation interval, 

3 wherein resource initiation interval is based on resource usage of the dependence loop 

4 and available processor resources and recurrence initiation interval is based on the 

5 instructions in the dependence loop body and latencies of the processor. 

1 4. The method of claim 1, wherein the instructions are based on clock cycles. 

1 5. A method comprising parallelizing execution of loop computations, by 

2 transforming dependence loop instructions into multiple independent iterations, in sparse 

3 arrays/matrices using software-pipelining to break recurrence initiation interval such that 

4 recurrence initiation interval is substantially closer to resource initiation interval. 

1 6. The method of claim 5, wherein using the software-pipelining to break the 

2 recurrence initiation interval comprises implementing loop-body instructions in parallel 

3 in the multiple independent iterations. 

1 7. The method of claim 6, wherein the loop-body instructions comprise loop- 

2 body cycles. 
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1 8. The method of claim 7, wherein the resource initiation interval is based on 

2 resource usage of the loop and available processor resources and the recurrence initiation 

3 interval is based on the cycles in the dependence loop and latencies of the processor. 

19. A method comprising: 

2 performing a run time dependency check during a current iteration using prior 

3 * computed values obtained from a predetermined number of previous adjacent iterations 

4 in a sparse array matrix; and 

5 parallelizing dependence loop instructions between the current and 

6 subsequent multiple iterations by using the computed values to make the dependence 

7 loop instructions into independent loop iterations so as to increase instruction-level 

8 parallelism and reduce recurrence initiation interval in the current iteration as a function 

9 of the run time dependency check in the current iteration. 

1 1 0. The method of claim 9, wherein the prior computed values are based on 

2 virtual unrolling using a virtual unrolling factor. 

1 11. The method of claim 1 0, wherein the virtual unrolling factor is three 

2 previous iterations. 

1 12. The method of claim 10, wherein the virtual unrolling factor is computed 

2 using the recurrence initiation interval and latency in number of cycles of a floating point 

3 multiply add operation in the sparse array matrix. 

1 13. The method of claim 9, further comprising: 

2 assigning the prior computed values to a predetermined number of 

3 adjacent registers. 

1 14. The method of claim 12, further comprising: 

2 performing register rotation to include computed values obtained from the current 

3 iteration; and 
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repeating performing the run time dependency check and parallelizing the 
dependence loop instructions to increase the instruction-level parallelism and reduce the 
recurrence initiation interval in a next iteration. 



1 15. A method comprising : 

2 transforming sparse array matrix code to perform a run time dependency check 

3 using a predetermined number of prior computed values; 

4 software-pipelining the transformed sparse array matrix code to perform the run 

5 time dependency check in a current iteration using the predetermined number of prior 

6 computed values; and 

7 software-pipelining to parallelize multiple iterations by overlapping 

8 execution of dependence loop instructions in the prior computed values to reduce 

9 recurrence initiation interval in the sparse array matrix based on the run time dependency 
10 check. 



1 16. The method of claim 15, wherein software-pipelining the transformed sparse 

2 array matrix code comprises: 

3 forming a predetermined number of variables based on a virtual unrolling factor; 

4 initializing the formed predetermined number of variables; 

5 loading the prior computed values into the predetermined number of variables; 

6 assigning the prior computed values to a predetermined number of substantially 

7 adjacent registers; and 

8 software-pipelining using the assigned prior computed values. 

1 17. The method of claim 1 6, further comprising: 

2 performing register rotation to include computed values obtained from the current 

3 iteration; and 

4 repeating software-pipelining and using the register rotated computed 

5 values to reduce the recurrence initiation interval in a next iteration. 
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1 18. The method of claim 17, wherein the computed values obtained from the 

2 predetermined number of prior adjacent iterations are based on virtual unrolling by using 

3 a virtual unrolling factor. 

1 19. The method of claim 18, wherein the virtual unrolling factor is three. 

1 20. A method comprising: 

2 transforming loop computations from a predetermined number of prior adjacent 

3 iterations in sparse arrays/matrices to current loads code to perform a run time 

4 dependency check using a predetermined number of prior computed values; 

5 software-pipelining the transformed loop computations from the predetermined 

6 number of prior adjacent iterations to perform a run time dependency check in a current 

7 iteration using the predetermined number of prior computed values; and 

8 parallelizing the loop computations using the prior computed values. 

1 21. The method of claim 20, wherein the predetermined number of prior 

2 computed values is obtained from the predetermined number of prior adjacent iterations 

3 based on a virtual unrolling factor. 

1 22. The method of claim 21 , wherein the virtual unrolling factor is three. 

1 23. The method of claim 20, wherein parallelizing the computations using the prior 

2 computed values comprise: 

3 overlapping execution of dependence loop instructions in multiple 

4 dependent iterations in the sparse arrays/matrices using the software-pipelining. 

1 24. The method of claim 20, further comprising: 

2 assigning the prior computed values to a predetermined number of 

3 adjacent register. 
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25. The method of claim 24, further comprising: 

performing register rotation to include computed values obtained from the 
current iteration. 



1 26. An article comprising a computer-readable medium which stores computer- 

2 executable instructions, the instructions causing a computer to: 

3 performing a run time dependency check during a current iteration using prior 

4 computed values obtained from a predetermined number of previous adjacent iterations 

5 in a sparse array matrix; and 

6 parallelizing dependence loop instructions between the current and 

7 subsequent multiple iterations by using the computed values to make the dependence 

8 loop instructions into independent loop iterations. 



1 27. The article comprising a computer-readable medium which stores computer 

2 executable instruction of claim 26, wherein performing the run time dependency check 

3 comprises: 

4 transforming loop computations from a predetermined number of prior adjacent 

5 iterations in sparse arrays/matrices to current loads code to perform a run time 

6 dependency check using a predetermined number of prior computed values; and 

7 software-pipelining the transformed loop computations from the 

8 predetermined number of prior adjacent iterations to perform a run time dependency 

9 check in a current iteration using the predetermined number of prior computed values. 

1 28. The article comprising a computer-readable medium which stores 

2 computer-executable instructions of claim 26, wherein the instructions further cause a 

3 computer to assign the prior computed values to a predetermined number of adjacent 

4 register. 

1 29. The article comprising a computer-readable medium which stores 

2 computer-executable instructions of claim 28, wherein the instructions further cause a 
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computer to perform register rotation to include computed values obtained from the 
current iteration. 



1 30. A system comprising: 

2 a bus; 

3 a processor coupled to the bus; 

4 a memory coupled to the processor; 

5 a network interface device; 

6 wherein execution of loop computations in indirectly accessed sparse 

7 arrays/matrices use software-pipelining to increase instruction-level parallelism and 

8 decrease initiation interval by performing: 

9 transforming loop computations from a predetermined number of prior adjacent 

10 iterations in sparse arrays/matrices to current loads code to perform a run time 

1 1 dependency check using a predetermined number of prior computed values; 

12 software-pipelining the transformed loop computations from the predetermined 

1 3 number of prior adjacent iterations to perform a run time dependency check in a current 

14 iteration using the predetermined number of prior computed values; and 

1 5 parallelizing the loop computations using the prior computed values to 

16 reduce recurrence initiation interval in the undisambiguated pointer stores from the 

17 predetermined number of prior adjacent iterations to current loads code based on the run 

1 8 time dependency check. 

1 31. The system of claim 30, wherein the processor further assigns the prior 

2 computed values to a predetermined number of adjacent register. 

1 32. The system of claim 3 1 , wherein the processor further performs register 

2 rotation to include computed values obtained from the current iteration. 

1 33. A method comprising software-pipelining a loop by dynamic run time 

2 disambiguation of elements of an index array using otherwise-idle issues slots in a 

3 software-pipelining schedule. 
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1 34. An article comprising a computer-readable medium that stores computer- 

2 executable instructions, the instructions causing a computer to software-pipeline a loop 

3 by dynamic run time disambiguation of elements of an index array using otherwise-idle 

4 issues slots in a software-pipelining schedule. 
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